Search for: All records

Creators/Authors contains: "Lam, Monica S."

« Prev Next »

Total Resources

11

Resource Type
Conference Paper

11

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

8

Citation Only

3

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation

Moradshahi, Mehrad ; Semnani, Sina J ; Lam, Monica S. ( May 2023 , Proceedings of the conference Association for Computational Linguistics European Chapter Conference)

Task-oriented Dialogue (ToD) agents are mostly limited to a few widely-spoken languages, mainly due to the high cost of acquiring training data for each language. Existing low-cost approaches that rely on cross-lingual embeddings or naive machine translation sacrifice a lot of accuracy for data efficiency, and largely fail in creating a usable dialogue agent. We propose automatic methods that use ToD training data in a source language to build a high-quality functioning dialogue agent in another target language that has no training data (i.e. zero-shot) or a small training set (i.e. fewshot). Unlike most prior work in cross-lingual ToD that only focuses on Dialogue State Tracking (DST), we build an end-to-end agent. We show that our approach closes the accuracy gap between few-shot and existing fullshot methods for ToD agents. We achieve this by (1) improving the dialogue data representation, (2) improving entity-aware machine translation, and (3) automatic filtering of noisy translations. We evaluate our approach on the recent bilingual dialogue dataset BiToD. In Chinese to English transfer, in the zero-shot setting, our method achieves 46.7% and 22.0% in Task Success Rate (TSR) and Dialogue Success Rate (DSR) respectively. In the few-shot setting where 10% of the data in the target language is used, we improve the state-of-the-art by 15.2% and 14.0%, coming within 5% of full-shot training.
more » « less
Free, publicly-accessible full text available May 1, 2024
Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

Moradshahi, Mehrad ; Tsai, Victoria ; Campagna, Giovanni ; Lam, Monica S. ( May 2023 , Proceedings of the conference Association for Computational Linguistics European Chapter Conference)

Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values and eliminate costly human supervision used in previous benchmarks. We also propose a new contextual semantic parsing model, which encodes the formal slots and values, and only the last agent and user utterances. We show that the succinct representation reduces the compounding effect of translation errors, without harming the accuracy in practice. We evaluate our approach on several dialogue state tracking benchmarks. On RiSAWOZ, CrossWOZ, CrossWOZ-EN, and MultiWOZ-ZH datasets we improve the state of the art by 11%, 17%, 20%, and 0.3% in joint goal accuracy. We present a comprehensive error analysis for all three datasets showing erroneous annotations can lead to misguided judgments on the quality of the model. Finally, we present RiSAWOZ English and German datasets, created using our translation methodology. On these datasets, accuracy is within 11% of the original showing that high-accuracy multilingual dialogue datasets are possible without relying on expensive human annotations. We release our datasets and software open source.
more » « less
Free, publicly-accessible full text available May 1, 2024
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

Moradshahi, Mehrad ; Shen, Tianhao ; Bali, Kalika ; Choudhury, Monojit ; de Chalendar, Gaël ; Goel, Anmol ; Kim, Sungkyun ; Kodali, Prashant ; Kumaraguru, Ponnurangam ; Semmar, Nasredine ; et al ( July 2023 , Findings of the Association for Computational Linguistics (ACL), Toronto, Canada, 2023)

Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English- Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset,
more » « less
Free, publicly-accessible full text available July 1, 2024
HybridTrak: Adding Full-Body Tracking to VR Using an Off-the-Shelf Webcam

https://doi.org/10.1145/3491102.3502045

Yang, Jackie ; Chen, Tuochao ; Qin, Fang ; Lam, Monica S. ; Landay, James A. ( April 2022 , CHI '22: CHI Conference on Human Factors in Computing Systems)

Full-body tracking in virtual reality improves presence, allows interaction via body postures, and facilitates better social expression among users. However, full-body tracking systems today require a complex setup fixed to the environment (e.g., multiple lighthouses/cameras) and a laborious calibration process, which goes against the desire to make VR systems more portable and integrated. We present HybridTrak, which provides accurate, real-time full-body tracking by augmenting inside-out1 upper-body VR tracking systems with a single external off-the-shelf RGB web camera. HybridTrak uses a full-neural solution to convert and transform users’ 2D full-body poses from the webcam to 3D poses leveraging the inside-out upper-body tracking data. We showed HybridTrak is more accurate than RGB or depth-based tracking methods on the MPI-INF-3DHP dataset. We also tested HybridTrak in the popular VRChat app and showed that body postures presented by HybridTrak are more distinguishable and more natural than a solution using an RGBD camera.
more » « less
Full Text Available
DIY assistant: a multi-modal end-user programmable virtual assistant

https://doi.org/10.1145/3453483.3454046

Fischer, Michael H. ; Campagna, Giovanni ; Choi, Euirim ; Lam, Monica S. ( June 2021 , PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation)

While Alexa can perform over 100,000 skills, its capability covers only a fraction of what is possible on the web. Individuals need and want to automate a long tail of web-based tasks which often involve visiting different websites and require programming concepts such as function composition, conditional, and iterative evaluation. This paper presents DIYA (Do-It-Yourself Assistant), a new system that empowers users to create personalized web-based virtual assistant skills that require the full generality of composable control constructs, without having to learn a formal programming language. With DIYA, the user demonstrates their task of interest in the browser and issues a few simple voice commands, such as naming the skills and adding conditions on the action. DIYA turns these multi-modal specifications into voice-invocable skills written in the ThingTalk 2.0 programming language we designed for this purpose. DIYA is a prototype that works in the Chrome browser. Our user studies show that 81% of the proposed routines can be expressed using DIYA. DIYA is easy to learn, and 80% of users surveyed find DIYA useful.
more » « less
Full Text Available
Grounding Open-Domain Instructions to Automate Web Support Tasks

Xu, Nancy: Masling ; Du, Michael ; Campagna, Giovanni ; Heck, Larry ; Landay, James ; Lam, Monica S. ( June 2021 , 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics)

Grounding natural language instructions on the web to perform previously unseen tasks enables accessibility and automation. We introduce a task and dataset to train AI agents from open-domain, step-by-step instructions originally written for people. We build RUSS (Rapid Universal Support Service) to tackle this problem. RUSS consists of two models: First, a BERT-LSTM with pointers parses instructions to ThingTalk, a domain-specific language we design for grounding natural language on the web. Then, a grounding model retrieves the unique IDs of any webpage elements requested in ThingTalk. RUSS may interact with the user through a dialogue (e.g. ask for an address) or execute a web operation (e.g. click a button) inside the web runtime. To augment training, we synthesize natural language instructions mapped to ThingTalk. Our dataset consists of 80 different customer service problems from help websites, with a total of 741 step-by-step instructions and their corresponding actions. RUSS achieves 76.7% end-to-end accuracy predicting agent actions from single instructions. It outperforms state-of-the-art models that directly map instructions to actions without ThingTalk. Our user study shows that RUSS is preferred by actual users over web navigation.
more » « less
Full Text Available
Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web

https://doi.org/10.1145/3340531.3411974

Xu, Silei ; Campagna, Giovanni ; Li, Jian ; Lam, Monica S. ( October 2020 , CIKM '20: The 29th ACM International Conference on Information and Knowledge Management)
null (Ed.)
Building a question-answering agent currently requires large annotated datasets, which are prohibitively expensive. This paper proposes Schema2QA, an open-source toolkit that can generate a Q&A system from a database schema augmented with a few annotations for each field. The key concept is to cover the space of possible compound queries on the database with a large number of in-domain questions synthesized with the help of a corpus of generic query templates. The synthesized data and a small paraphrase set are used to train a novel neural network based on the BERT pretrained model. We use Schema2QA to generate Q&A systems for five this http URL domains, restaurants, people, movies, books and music, and obtain an overall accuracy between 64% and 75% on crowdsourced questions for these domains. Once annotations and paraphrases are obtained for a this http URL schema, no additional manual effort is needed to create a Q&A agent for any website that uses the same schema. Furthermore, we demonstrate that learning can be transferred from the restaurant to the hotel domain, obtaining a 64% accuracy on crowdsourced questions with no manual effort. Schema2QA achieves an accuracy of 60% on popular restaurant questions that can be answered using this http URL. Its performance is comparable to Google Assistant, 7% lower than Siri, and 15% higher than Alexa. It outperforms all these assistants by at least 18% on more complex, long-tail questions.
more » « less
Full Text Available
DoThisHere: Multimodal Interaction to Improve Cross-Application Tasks on Mobile Devices

https://doi.org/10.1145/3379337.3415841

Yang, Jackie ; Lam, Monica S. ; Landay, James A. ( October 2020 , UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology)
null (Ed.)
Many computing tasks, such as comparison shopping, two-factor authentication, and checking movie reviews, require using multiple apps together. On large screens, "windows, icons, menus, pointer" (WIMP) graphical user interfaces (GUIs) support easy sharing of content and context between multiple apps. So, it is straightforward to see the content from one application and write something relevant in another application, such as looking at the map around a place and typing walking instructions into an email. However, although today's smartphones also use GUIs, they have small screens and limited windowing support, making it hard to switch contexts and exchange data between apps. We introduce DoThisHere, a multimodal interaction technique that streamlines cross-app tasks and reduces the burden these tasks impose on users. Users can use voice to refer to information or app features that are off-screen and touch to specify where the relevant information should be inserted or is displayed. With DoThisHere, users can access information from or carry information to other apps with less context switching. We conducted a survey to find out what cross-app tasks people are currently performing or wish to perform on their smartphones. Among the 125 tasks that we collected from 75 participants, we found that 59 of these tasks are not well supported currently. DoThisHere is helpful in completing 95% of these unsupported tasks. A user study, where users are shown the list of supported voice commands when performing a representative sample of such tasks, suggests that DoThisHere may reduce expert users' cognitive load; the Query action, in particular, can help users reduce task completion time.
more » « less
Full Text Available
Soundr: Head Position and Orientation Prediction Using a Microphone Array

https://doi.org/10.1145/3313831.3376427

Yang, Jackie ; Banerjee, Gaurab ; Gupta, Vishesh ; Lam, Monica S. ; Landay, James A. ( April 2020 , CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems)
null (Ed.)
Although state-of-the-art smart speakers can hear a user's speech, unlike a human assistant these devices cannot figure out users' verbal references based on their head location and orientation. Soundr presents a novel interaction technique that leverages the built-in microphone array found in most smart speakers to infer the user's spatial location and head orientation using only their voice. With that extra information, Soundr can figure out users references to objects, people, and locations based on the speakers' gaze, and also provide relative directions. To provide training data for our neural network, we collected 751 minutes of data (50x that of the best prior work) from human speakers leveraging a virtual reality headset to accurately provide head tracking ground truth. Our results achieve an average positional error of 0.31m and an orientation angle accuracy of 34.3° for each voice command. A user study to evaluate user preferences for controlling IoT appliances by talking at them found this new approach to be fast and easy to use.
more » « less
Full Text Available
Soteria: A Provably Compliant User Right Manager Using a Novel Two-Layer Blockchain Technology

https://doi.org/10.1109/IEEECONF47748.2020.9377624

Fu, Wei-Kang ; Lin, Yi-Shan ; Campagna, Giovanni ; Liu, Chun-Ting ; Tsai, De-Yi ; Mei, Chung-Huan ; Chang, Edward Y. ; Liao, Shih-Wei ; Lam, Monica S. ( October 2020 , 2020 IEEE Infrastructure Conference)
null (Ed.)
Soteria is a user right management system designed to safeguard user-data privacy in a transparent and provable manner in compliance to regulations such as GDPR and CCPA. Soteria represents user data rights as formal executable sharing agreements, which can automatically be translated into a human readable form and enforced as data are queried. To support revocation and to prove compliance, an indelible, audited trail of the hash of data access and sharing agreements are stored on a two-layer distributed ledger. The main chain ensures partition tolerance and availability (PA) properties while side chains ensure consistency and availability (CA), thus providing the three properties of the CAP (consistency, availability, and partition tolerance) theorem. Besides depicting the two-layer architecture of Soteria, this paper evaluates representative consensus protocols and recommends side-chain and inter-chain management strategies for improving latency and throughput.
more » « less
Full Text Available

« Prev Next »